Features for entity coreference resolution
TODO: from Recasens and Hovy (2009)Recasens, M., & Hovy, E. (2009). A deeper look into features for coreference resolution. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5847 LNAI(i), 29–42. http://doi.org/10.1007/978-3-642-04975-0_3 "The feature set representing m1 and m2 that was employed in the decision tree learning algorithm of 4 has been taken as a starting point by most sub- sequent systems. It consists of only 12 surface-level features (all boolean except for the first): (i) sentence distance, (ii) m1 is a pronoun, (iii) m2 is a pronoun, (iv) string match (after discarding determiners), (v) m2 is a definite NP, (vi) m2 is a demonstrative NP, (vii) number agreement, (viii) WordNet semantic class agreement,4 (ix) gender agreement, (x) both m1 and m2 are proper nouns (capitalized), (xi) m1 is an alias of m2 or vice versa, and (xii) m1 is an apposition to m2. The strongest indicators of coreference turned out to be string match, alias, and appositive. Ng and Cardie 5 expanded the feature set of 4 from 12 to a deeper set of 53, including a broader range of lexical, grammatical, and semantic features such as substring match, comparison of the prenominal modifiers of both mentions, animacy match,WordNet distance, whether one or both mentions are pronouns, definite, embedded, part of a quoted string, subject function, and so on. The incorporation of additional knowledge succeeds at improving performance but only after manual feature selection, which points out the importance of removing irrelevant features that might be misleading. Surprisingly, however, some of the features in the hand-selected feature set do not seem very relevant from a linguistic point of view, like string match for pronominal mentions. More recent attempts have explored some additional features to further enrich the set of 5: backward features describing the antecedent of the candidate antecedent 11, semantic information from Wikipedia, WordNet and semantic roles 12, and most notably, Uryupina’s 8 thesis, which investigates the pos- sibility of incorporating sophisticated linguistic knowledge into a data-driven coreference resolution system trained on the MUC-7 corpus. Her extension of the feature set up to a total of 351 nominal features (1096 boolean/continuous) leads to a consistent improvement in the system’s performance, thus support- ing the hypothesis that complex linguistic factors of NPs are a valuable source of information. At the same time, however, 8 recognizes that by focusing on the addition of sophisticated features she overlooked the resolution strategy and some phenomena might be over-represented in her feature set. Bengtson and Roth 9 show that with a high-quality set of features, a simple pairwise model can outperform systems built with complex models on the ACE dataset. This clearly supports our stress on paying close attention to designing a strong, linguistically motivated set of features, which requires a detailed analysis of each feature individually as well as of the interaction between them. Some of the features we include, like modifiers match, are also tested by 9 and, interestingly, our ablation study comes to the same conclusion: almost all the features help, although some more than others." Taxonomy Mention From Culotta et al. (2007)Culotta, A., Wick, M., & McCallum, A. (2007). First-Order Probabilistic Models for Coreference Resolution. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference (pp. 81–88). Rochester, New York: Association for Computational Linguistics.: * Match features - Check whether gender, number, head text, or entire phrase matches * Mention type (pronoun, name, nominal) * Aliases - Heuristically decide if one noun is the acronym of the other * Apposition - Heuristically decide if one noun is in apposition to the other * Relative Pronoun - Heuristically decide if one noun is a relative pronoun referring to the other. * Wordnet features - Use Wordnet to decide if one noun is a hypernym, synonym, or antonym of another, or if they share a hypernym. * Both speak - True if both contain an adjacent context word that is a synonym of “said.” This is a domain-specific feature that helps for many newswire articles. * Modifiers Match - for example, in the phrase “President Clinton”, “President” is a modifier of “Clinton”. This feature indicates if one noun is a modifier of the other, or they share a modifier. * Substring - True if one noun is a substring of the other (e.g. “Egypt” and “Egyptian”)." Type in an ontology (Prokofyev et al. 2015Prokofyev, R., Tonon, A., Luggen, M., Vouilloz, L., Difallah, D. E., & Cudré-Mauroux, P. (2015). SANAPHOR: Ontology-based coreference resolution. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9366, 458–473. http://doi.org/10.1007/978-3-319-25007-6_27) Mention position Saliency: Lappin and Leass (1994)Shalom Lappin and Herbert Leass. 1994. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561. Recency: textual (distance in sentences, distance in words) (TODO: ref?) or psychological (recently used) (Webster and Curran, 2014Webster, K., & Curran, J. R. (2014). Limited memory incremental coreference resolution. In COLING (pp. 2129–2139).) Instance-level knowledge Knowledge about specific people, organization, etc., can be queried from Wikipedia/DBpedia/FreeBase/etc. Syntax TODO: Versley et al. (2008)Versley, Y., AlessandroMoschitti, Poesio, M., & Moschitti, A. (2008). Coreference Systems based on Kernels Methods. Coling 2008, (August), 961–968. Event structure (argument-predicate, selectional preference) Dagan and Itai (1990)Dagan, I., & Itai, A. (1990). Automatic Processing of Large Corpora for the Resolution of Anaphora References. In Proceedings of the 13th Conference on Computational Linguistics - Volume 3 (pp. 330–332). Stroudsburg, PA, USA: Association for Computational Linguistics. http://doi.org/10.3115/991146.991209 use a simple co-occurrence table to establish the semantic compatibility of a common noun and the semantic context an anaphoric pronoun. In this case, government is observed in a corpus to be the subject of collect ''and ''money ''object of ''collect. The algorithm therefore resolve the first it to government and the second to tax money. They know full well that the companies held tax money aside for collection later on the basis that the government said it was going to collect it. Dagan et al. (1995)Ido Dagan, John Justenson, Shalom Lappin, Her- bert Leass, and Amnon Ribak. 1995. Syntax and lexical statistics in anaphora resolution. Applied Artificial Intelligence, 9(6):633–644, Nov/Dec. proposed to use similar statistics in a post-processing step. Kehler et al. (2004)Kehler, A.; Appelt, D.; Taylor, L.; and Simma, A. 2004b. The (non)utility of predicate-argument frequencies for pro- noun interpretation. HLT-NAACL. evaluate the argument-predicate statistics both as features in training and as post-processing and found them to be ineffective. They try to alleviate the data scarcity problem by web counts but the performance wasn't improved. The evaluation was done on ACE corpus. Whereas ... Yang et al. (2005)Yang, X., Su, J., & Tan, C. L. (2005). Improving Pronoun Resolution Using Statistics-based Semantic Compatibility Information. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 165–172). Stroudsburg, PA, USA: Association for Computational Linguistics. http://doi.org/10.3115/1219840.1219861 use "twin-candidate model". The two "web corpus" can't be compared (?) Yang et al. found consistent improvement when testing on MUC corpus. In an invited talk, Strube proclaimed: "forget about “semantics” - go to a math class - study algorithms". However, he was talking about "distributional approaches/semantic role labeling/WordNet/Wikipedia" but not event structure per se. Haghighi and Klein (2010)Haghighi, A., & Klein, D. (2010). Coreference resolution in a modular, entity-centered model. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, (June), 385–393. http://doi.org/10.3115/1608810.1608821: "Our generative model exploits a large inventory of distributional entity types, including standard NER types like PERSON and ORG, as well as more refined types like WEAPON and VEHICLE. For each type, distributions over typical heads, modifiers, and governors are learned from large amounts of unlabeled data, capturing type-level semantic information (e.g. “spokesman” is a likely head for a PERSON)." Lee et al. (2012)Lee, H., Recasens, M., Chang, A., Surdeanu, M., & Jurafsky, D. (2012). Joint Entity and Event Coreference Resolution across Documents. (EMNLP-CoNLL 2012) Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (July), 489–500.??? Rahman and Ng (2011)Rahman, A., & Ng, V. (2011). Coreference Resolution with World Knowledge. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 814–824).: "we encode the semantic roles of NPj and NPk as one of five possible values: ARG0- ARG0, ARG1-ARG1, ARG0-ARG1, ARG1-ARG0, and OTHERS (the default case)" Rahman and Ng (2011): "the use of related verbs is similar in spirit to Bean and Riloff’s (2004) use of patterns for inducing contextual role knowledge, and the use of semantic roles is also discussed in Ponzetto and Strube (2006)." Ponzetto and Strube (2006)Ponzetto, S. P., Ponzetto, S. P., Strube, M., & Strube, M. (2006). Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution. NAACL 2006, 33(June), 192–199. http://doi.org/10.3115/1220835.1220860 add two features for semantic role argument-predicate pairs of each mention in a candidate pair. However, they use a linear classifier which means their combinations with other features are not taken into account, i.e. the model DOESN'T CAPTURE compatibility between a mention and the role of the other mention. Event structure (event-event) Liu et al. (2016)Liu, Q., Jiang, H., Evdokimov, A., Ling, Z., Zhu, X., Wei, S., & Hu, Y. (2015). Probabilistic Reasoning via Deep Learning: Neural Association Models. use a neural net to model event-event association. They apply it to Winograd schema but not standard CR datasets. Rahman and Ng (2011): "have the learner learn directly from coreference-annotated data whether two NPs serving as the objects of decry and denounce are likely to be coreferent or not, for instance... we create five binary-valued features by pairing each of these five values with the two stemmed predicates" Surrounding context Words in a window around mentions: Versley et al. (2008)Versley, Y., Moschitti, A., Poesio, M., & Yang, X. (2008). Coreference Systems based on Kernels Methods. COLING 2008, 961–968. References Category:Entity coreference resolution